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Abstract 

Background: Random-effects meta-analysis is commonly performed by first deriving an estimate of the 
between-study variation, the heterogeneity, and subsequently using this as the basis for combining results, i.e., 
for estimating the effect, the figure of primary interest. The heterogeneity variance estimate however is 
commonly associated with substantial uncertainty, especially in contexts where there are only few studies 
available, such as in small populations and rare diseases. 

Methods: Confidence intervals and tests for the effect may be constructed via a simple normal approximation, 
or via a Student-t distribution, using the Hartung-Knapp-Sidik-Jonkman (HKSJ) approach, which additionally 
uses a refined estimator of variance of the effect estimator. The modified Knapp-Hartung method (mKH) 
applies an ad hoc correction and has been proposed to prevent counterintuitive effects and to yield more 
conservative inference. We performed a simulation study to investigate the behaviour of the standard HKSJ 
and modified mKH procedures in a range of circumstances, with a focus on the common case of meta-analysis 
based on only a few studies. 

Results: The standard HKSJ procedure works well when the treatment effect estimates to be combined are of 
comparable precision, but nominal error levels are exceeded when standard errors vary considerably between 
studies (e.g. due to variations in study size). Application of the modification on the other hand yields more 
conservative results with error rates closer to the nominal level. Differences are most pronounced in the 
common case of few studies of varying size or precision. 

Conclusions: Use of the modified mKH procedure is recommended, especially when only a few studies 
contribute to the meta-analysis and the involved studies' precisions (standard errors) vary. 
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Background 

Random-effects meta-analysis is most commonly per¬ 
formed based on an underlying hierarchical model in¬ 
cluding two unknowns as parameters: the effect fj ,, 
which is the figure of primary interest, and the 
between-study variance (heterogeneity) r 2 , which is 
a nuisance parameter. Inference then is usually done 
sequentially, by first deriving an estimate of the hetero¬ 
geneity variance, r , and then determining the effect 
estimate (i by conditioning on the estimate r 2 [1, 2], 
A large number of different estimators for the hetero¬ 
geneity variance is available (see e.g. [3-6]), and effect 
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estimation may be done based on a simple normal ap¬ 
proximation, or by utilizing a Student-t distribution [7] 
with an additionally refined estimator of the variance 
of (i [8-12]. While the normal model may be moti¬ 
vated by asymptotic arguments, in actual applications 
the number of estimates to be combined is commonly 
small [13, 14] and hence the estimation uncertainty in 
the between-study variance r 2 is substantial, so that 
an adjustment is appropriate and in fact improves op¬ 
erating characteristics [7 12, 15]. 

The problem of deriving estimates from only a small 
number of data sources is a common problem espe¬ 
cially in fields of application where empirical infor¬ 
mation is sparse due to the rarity of the condition in 
question. The rarity of a disease is often accompanied 
with a low (commercial) interest or incentive, which is 
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why such diseases are also known as orphan diseases. 
According to the European Commission, a disease is 
designated orphan status when the prevalence is < 5 
in 10 000 [16]. While by definition any individual rare 
disease has a low prevalence, there is a large number 
of these, eventually affecting a substantial fraction of 
an estimated 6-8% of the total population [17], and 
with that posing a challenge to health care systems 
worldwide. 

The European Medicines Agency acknowledges the 
particular obstacles in rare diseases research but 
points out that there is no fundamental difference be¬ 
tween rare and more common diseases and hence no 
“paradigm change” when it comes to regulatory is¬ 
sues. Because of the common small-sample settings, 
the importance of sophisticated methods is empha¬ 
sized, and meta-analyses of good quality randomised 
controlled clinical trials are still considered the high¬ 
est level of evidence [18]. The problems encountered 
in rare diseases research often call for special statisti¬ 
cal methods, especially with respect to study designs 
[17, 19, 20]. Meta-analyses are particularly important 
in this field due to the lack of large trials, while these 
will commonly still be faced with the problem of small 
numbers of available studies. Between-study hetero¬ 
geneity then is anticipated, since the gathered pieces 
of evidence are likely to differ with respect to study 
designs, types of control groups or treatment alloca¬ 
tion [17, 19-21]. Small studies have in fact epirically 
been found to exhibit more heterogeneity than large 
trials [22], Consequently, the use of methods suitable 
for few studies and marginally significant findings is of 
crucial importance here. 

With an estimated incidence of 2-20 cases per 
100 000 population, juvenile idiopathic arthritis (JIA) 
is an example of a rare disease [23]. In the following, 
we will use a meta-analysis in JIA [24] as a case study 
to illustrate the different methods discussed below. 

In the following sections, we will first describe the 
methods used, then show the results of a simulation 
study, and demonstrate the different types of analyses 
in an example data set, before closing with conclusions 
and recommendations. 


The parameters 9i vary from trial to trial around a 
global mean /j £ B due to some heterogeneity vari¬ 
ance between trials that constitutes an additive vari¬ 
ance component to the model, 

0i ~ N (/j,t 2 ) for i = 1,..., k, (2) 

where r 2 > 0. The model may then be simplified 
by integrating out the parameters (9;, leading to the 
marginal expression 

Hi ~ N(/x, si + t 2 ) for i = (3) 


Among the two unknowns in the model, the overall 
mean fi, the effect, usually is the figure of primary in¬ 
terest, while the heterogeneity variance r 2 constitutes 
a nuisance parameter. When r 2 = 0, the model sim¬ 
plifies to the so-called fixed-effect model [1, 2]. 

The “relative amount of heterogeneity” in a meta¬ 
analysis may be expressed in terms of the measure I 2 , 
which is defined as 


1 2 = 


T l + i> 


( 4 ) 


where s is some kind of “average” standard error 
among the study-specific s,: [25]. In the following sim¬ 
ulation studies, we will determine s 2 as the arithmetic 
mean of squared standard errors. 


Parameter estimation 

If the value of the heterogeneity variance parameter t 2 
were known, the (conditional) maximum-likelihood ef¬ 
fect estimate would result as the weighted average 


h = 


Ej w i Vi 

Ei u ’i 


( 5 ) 


with “inverse variance weights” defined as 


Wi = — - 77 for * = 1,..., k. 

sj+T 2 


( 6 ) 


Methods 

Random-effects meta-analysis 

Meta-analysis is very commonly performed via a 
random-effects approach, utilizing the normal-normal 
hierarchical model. Here the data are given in terms 
of a number k of estimates y, £ 1R that are associ¬ 
ated with some uncertainty given through standard 
errors s,: > 0 that are taken to be known without 
uncertainty. The estimates are assumed to measure 
trial-specific parameters 0, £ 1R: 

( 1 ) 


A common approach to inference within the random- 
effects model is to first estimate the heterogeneity vari¬ 
ance t , and subsequently estimate the effect g, condi¬ 
tional on the heterogeneity estimate. Note that the uy 
are effectively treated as “known” while in fact both 
the s 2 as well as r 2 are only measured with some un¬ 
certainty (that depends on the size/precision of the ith 
individual study and the number of studies k). There 
is a wide range of different heterogeneity estimators 
available (see e.g. [3-6] for more details). In the fol¬ 
lowing we will concentrate on some of the most com¬ 
mon ones, the DerSimonian-Laird (DL) estimator, a 


Hi ~ N(0j, s 2 ) for i = 1,... ,k. 
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moment estimator [26] with acknowledged shortcom- 
ings [27], the restricted maximum likelihood (REML) 
estimator [3, 28], and the Paule-Mandel (PM) estima¬ 
tor, an essentially heuristic approach [29, 30]. Software 
to compute the different estimates is provided e.g. in 
the metafor and met a R packages [31, 32]. 

Confidence intervals and tests 

Normal approximation 

Confidence intervals and, equivalently, tests for the ef¬ 
fect p are commonly constructed using a normal ap¬ 
proximation for the estimate p. The standard error 
of p, conditional on a fixed heterogeneity variance 
value t 2 , is given by 



A confidence interval for the effect p then results via 
a normal approximation as 

A ± ^(1—a/2) (8) 

where £(i- a / 2 ) is the (1—a/2)-quantile of the standard 
normal distribution, and (1 —a) is the nominal cover¬ 
age probability [1, 2]. The normal approximation based 
on a heterogeneity variance estimate f 2 usually works 
well for many studies (large k) and small standard er¬ 
rors (small Si), or negligible heterogeneity (small vari¬ 
ance t 2 ), but tends to be anticonservative otherwise 
[8, 9, 33], 

The Hartung-Knapp-Sidik-Jonkman (HKSJ) method 
Hartung and Knapp [8, 9] and Sidik and Jonkman [10] 
independently introduced an adjusted confidence in¬ 
terval. In order to determine the adjusted interval, first 
the quadratic form 

9 = - j —[ Wi ^ - ^) 2 ( 9 ) 

i 

is computed [8-10]. The adjusted confidence interval 
then results as 

A ± VQ t(fc-l) ; (l-a/2) (10) 

where f(fc_i) : (i- a / 2 ) is the (1 — a/2)-quantile of the 
Student-f distribution with (£;—1) degrees of freedom. 
Note that qcr 2 p is derived from a non-negative and 

unbiased estimator of — [341. Confidence inter¬ 
im™* 

vals based on the normal approximation may easily 
be converted to HKSJ-adjusted ones [12]. The method 
has also been generalized to the cases of multivariate 
nreta-analysis and meta-regeression [35]. 


The modified Knapp-Hartung (mKH) method 
The HKSJ confidence interval (10) tends to be wider 
than the one based on the normal approximation (8), 
since the Student-f quantile is larger than the corre¬ 
sponding normal quantile, while q will tend to be some¬ 
where around unity. However, q may in fact also turn 
out arbitrarily small, and if ,/q < ~ (1 -<» /2) — then 

the modified interval will be shorter than the normal 
one, which may be considered counter-intuitive. A sim¬ 
ple ad hoc modification to the procedure results from 
defining 

q* = max{l, q} (11) 

[11] and using q* instead of q to construct confidence 
intervals and tests. This will ensure a more conserva¬ 
tive procedure. The modification was originally pro¬ 
posed in the rneta-regression context, but the simple 
meta-analysis here constitutes the special case of an 
“intercept-only” regression. 

Note that the PM heterogeneity variance estimator 
is effectively defined by choosing f 2 such that q (equa¬ 
tion (9)) is = 1 (or less, if no solution exists) [29, 30], so 
that for the PM estimator the corresponding q* value 
always equals q* = 1. 

Simulations 

Since a (1—a) confidence interval is supposed to cover 
the true parameter value with probability (1 — a ), the 
calibration of such intervals may be checked by repeat¬ 
edly generating random data based on known param¬ 
eter values and then determining the empirical fre¬ 
quency with which true values are actually covered 
[36]. We performed such a Monte Carlo simulation 
comparing the HKSJ and mKH approaches using the 
setup that was introduced by IntHout et al. [12], Data 
were simulated on a continuous scale, according to 
the random-effects model described above, with study- 
specific standard errors Si set to reflect certain scenar¬ 
ios with respect to the relative size of studies and their 
variation due to estimation uncertainty. Many meta- 
analyses are based on (discrete) count data, where the 
random-effects model assumptions only hold to an ap¬ 
proximation that works well unless event probabilities 
are very low. Alternative methods have been proposed 
to deal with low event probabilities [37, 38], but low- 
event-rate effects were not considered in the present 
investigation. 
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Figure 1 Coverage probabilities of HKSJ and mKH 95% confidence intervals. Probabilities are shown in dependence of the 
number k of studies and the amount of heterogeneity I 2 . The four different scenarios A-D correspond to different amounts of 
imbalance between the study-specific standard errors Sj. The DerSimonian-Laird (DL) method was used for estimation of the 
heterogeneity variance r 2 . 
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These simulations considered four different scenarios, 
namely meta-analyses (A) with trials of equal size, 
(B) with equally sized trials but including one small 
trial, (C) with 50% large and small trials, and (D) 
equally sized trials and one large trial. The sizes of 
“small” and “large” trials (and hence, squared stan¬ 
dard errors s 2 ) here differ by a factor of ten, so that 
the associated standard errors differ by roughly a fac¬ 
tor of 3; for more details see [12, Appendix 2]. Num¬ 
bers of studies k considered here are in the range 
of 2-11, and the true levels of heterogeneity were 
I 2 € {0.00, 0.25, 0.50, 0.75,0.90}. At each combination 
of parameters 10 000 meta-analyses were simulated. 
All simulations were performed using R [39]. 

Results 

Simulations 

Figure 1 shows the estimated coverage probabilities of 
the different confidence intervals based on the DL het¬ 
erogeneity variance estimate. The corresponding figure 
for REML and PM look essentially the same, which is 
in line with the findings in [33, 40]. The HKSJ method 
works very well when the analyzed studies are of equal 
size (i.e., have equal standard errors), as can also be 
shown analytically [41], but coverage decreases in more 
imbalanced settings, especially for small numbers of 
studies. For the case of no heterogeneity (I 2 = 0) the 
HKSJ method also works fine, but if r 2 was known 
a priori, in this case the fixed-effect model should work 
as well. The mKH procedure on the other hand is 
rather conservative for small k, but does not tend to 
inflate the type-I error substantially regardless of the 
underlying study sizes or true heterogeneity. For both 
methods, the dependence on the amount of hetero¬ 
geneity is mostly a matter of whether r 2 is = 0 or > 0. 

Application of the modification obviously tends to 
widen the resulting confidence intervals. The ratio of 
interval lengths (which is equal to is shown in 

Figure 2. Most notably, the effect is largest for small 
heterogeneity and for few studies. 

The modification eventually only makes a difference 
in those cases where q turns out smaller than one. The 
fraction of intervals affected by the modification ranges 
between 31% and 82% for the DL heterogeneity vari¬ 
ance estimator and for the scenarios investigated here, 
with an overall average of 61%. For the other two esti¬ 
mators the fractions are 29-82% with a mean of 62% 
(REML), and 31-91% with a mean of 74% (PM). 
Again, the differences between the different estimators 
are rather small. With respect to the underlying simu¬ 
lation scenario, the probability decreases with increas¬ 
ing heterogeneity, since larger heterogeneity also leads 
to larger values of q. 
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Figure 2 Ratios of lengths of HKSJ and mKH confidence 
intervals. (Same as ). Lengths are shown in dependence 
of the number k of studies and the amount of 
heterogeneity I 2 . Numbers are averaged over all 4 scenarios. 
The DL method was used for r 2 estimation. 


JIA example data (Hinks et al., 2010) 


Prahalad (2006) 
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Figure 3 Hinks et al. (2010) example data. A forest plot 
illustrating the three estimated log odds ratios and 95% 
confidence intervals for the example data due to Hinks et al. 
[24], The estimated amount of heterogeneity variance 
(according to DL, REML and PM estimators) is zero here. At 
the bottom the three alternative combined estimates based on 
normal approximation, HKSJ and mKH approach are shown. 


Application to JIA example data 

Hinks et al. [24] studied the occurrence of a particular 
genetic variant, CCR5, in juvenile idiopathic arthritis 
(JIA) patients in comparison with the general popu¬ 
lation. Their investigation included a meta-analysis of 
a small number (k = 3) of available controlled studies 
looking into the association of JIA with this partic¬ 
ular biomarker. The analysis was based on logarith¬ 
mic odds ratios; the three estimates along with their 
standard errors are shown in a forest plot in Figure 3. 
Here the largest standard error is 50% larger than the 
smallest one. For these data, all three (DL, REML 
and PM) heterogeneity variance estimates turn out 
as f 2 = 0. A zero heterogeneity variance estimate is 
not uncommon, even when the actual heterogeneity is 
in fact substantial [14]. The associated q value is also 
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small with q = 0.31 (^/q = 0.55). The resulting three 
confidence intervals based on normal approximation, 
HKSJ adjustment and mKH method all differ in their 
lengths. The mKH interval is longest, and includes the 
zero log odds ratio (indicating no association between 
genetic marker and disease), while the other two in¬ 
tervals do not include zero. So the choice of procedure 
directly affects conclusions in this example. 

In this example, the three standard errors are rather 
similar (the largest is 50% larger than the smallest), 
but while the heterogeneity variance estimates f 2 are 
all at zero, the 95% Q-profile confidence interval [42] 
ranges up to r 2 = 0.33 2 , corresponding to / 2 = 0.90. 
Within the context of our simulations (Fig. 1), we can¬ 
not tell from the data which of the heterogeneity (/ 2 ) 
scenarios we are in fact in, and as the standard errors 
are not exactly the same, it remains a matter of bal¬ 
ancing the potential consequences whether one would 
rather risk losing on the side of (type I) error proba¬ 
bility or power. 

Discussion 

The HKSJ procedure ensures the coverage probabil¬ 
ity only when the included studies’ standard errors s* 
are similar; for unbalanced settings, the actual error 
probability tends to exceed the targeted one. With 
the standard definition of the correction factor q the 
results may sometimes be counterintuitive, since the 
corresponding CIs may turn out shorter than using 
the simple normal approximation; in fact they may get 
arbitrarily short. In case of no heterogeneity (r = 0) 
the HKSJ method also works well, however practically 
this is of limited relevance, as one can rarely tell (or 
convincingly argue) whether this condition holds. 

The ad hoc modification of the mKH method aims 
at fixing these shortcomings and results in type-I error 
probabilities that are not grossly in excess of the pre¬ 
specified ones. Especially when the standard errors s* 
are of dissimilar magnitude, the mKH method can 
therefore be recommended. For few studies (small fc), 
the modified procedure however tends to be very con¬ 
servative, with very small error probabilities especially 
in the extreme case of meta-analysis of only k = 2 stud¬ 
ies. In this extreme case the choice of methods may 
therefore be considered a matter of a power vs. type-I 
error probability tradeoff. 

While meta-analyses of few studies are a particular 
problem in indications where there is only little evi¬ 
dence available (such as rare diseases), such circum¬ 
stances are not as uncommon as one might expect. 
Turner et al. [13] and Kontopantelis et al. [14] investi¬ 
gated the analyses archived in the Cochrane Database 
and actually found a majority of them to be based on 
as few as k = 2 or k = 3 studies; so these constitute 


highly relevant cases for which the proper control of 
error rates is crucial. 

The properties of either unmodified or modified 
method for the extreme case of k = 2 may be con¬ 
sidered unsatisfactory, as it seems one has the choice 
of either falling short of or exceeding the targeted er¬ 
ror probability; the problem has in fact been regarded 
as effectively unsolved [43]. The poor behaviour may 
be explained by the fact that performing a random- 
effects meta-analysis effectively means the estimation 
of first- and second-order statistics, and it is not overly 
surprising to find that this is a hard task when the 
data consist of as few as two samples that are only 
measured with uncertainty. Bearing this in mind, the 
use of Bayesian methods [44] and the consideration of 
external evidence on the likely magnitude of the het¬ 
erogeneity [45] may be the way forward. 

List of abbreviations 
DL DerSimonian-Laird 

HKSJ Hartung-Knapp-Sidik-Jonkman 
JIA juvenile idiopathic arthritis 

mKH modified Knapp-Hartung 

PM Paule-Mandel 

REML restricted maximum likelihood 
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